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Abstract 

In  this  paper  we  discuss  the  benefit  of  par¬ 
allel  computing  in  propagating  orbits  of  ob¬ 
jects.  Several  analytic  methods  are  now  in 
use  operationally.  We  will  discuss  three  such 
schemes.  We  demonstrate  the  benefit  of  par¬ 
allelism  by  using  an  INTEL  iPSC/2  hyper¬ 
cube  and  by  using  a  cluster  of  Unix-based 
workstations  running  Parallel  Virtual  Ma¬ 
chine  (PVM).  The  software  PVM  allows  a 
heterogeneous  set  of  networked  workstations 
to  appear  as  a  multicomputer. 

We  will  show  that  one  can  achieve  near 
100%  efficiency  on  the  hypercube. 


*  Author  to  whom  all  correspondence  should  be 
addressed. 


1  Introduction 

The  Naval  Space  Command  (NAVSPACE- 
COM)  and  the  Air  Force  Space  Command 
(AFSPACECOM)  currently  track  daily  over 
6000  objects  in  elliptical  orbits  around  the 
Earth.  To  assist  in  identification  and  track¬ 
ing  of  these  objects  in  orbit,  they  both  use  an 
analytic  satellite  motion  model.  The  Navy  is 
using  the  subroutine  PPT2  based  on  varia¬ 
tion  of  elements  model  of  artificial  satellite 
motion  around  the  Earth.  The  theory  is  due 
to  Brouwer  and  Lyddane  [9].  Given  a  set  of 
satellite’s  “mean”  orbital  elements  at  a  given 
epoch,  the  model  predicts  the  state  (position 
and  velocity)  vector  at  a  future  time.  The 
model  considers  perturbing  accelerations  due 
to  atmospheric  drag,  oblateness  of  the  Earth, 
and  asymmetry  of  the  Earth’s  mass  about 
the  equatorial  plane.  The  Air  Force  is  using 
SGP4/SDP4,  (Simplified  General  Perturba- 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

1994 

2.  REPORT  TYPE 

N/A 

3.  DATES  COVERED 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Performance  of  Analytic  Orbit  Propagators  on  a  Hypercube  and  a 

5b.  GRANT  NUMBER 

yy  t»i  KMituiMi  musici 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Postgraduate  School  Department  of  Mathematics  Code  MA/Nd 
Monterey  CA  93943 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

uu 

18.  NUMBER 

OF  PAGES 

8 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


tions)  based  on  the  theory  of  Lane  and  Cran¬ 
ford  [8].  The  Deep  space  capabilities  are  due 
to  Hujsak’s  [6]  work.  They  replaced  the  old 
version  SGP  which  was  based  on  the  work  of 
Kozai  [7]  and  Brouwer  [2]  and  made  opera¬ 
tional  by  Hilton  and  Kuhlman  [5].  The  old 
version  had  no  capabilities  to  track  objects 
in  “deep  space”  i.e.  period  greater  than  225 
minutes. 

With  the  current  increase  in  space  oper¬ 
ations,  the  number  of  objects  necessary  to 
be  tracked  is  expected  to  increase  substan¬ 
tially.  Additionally,  if  there  exists  a  desire 
to  increase  the  accuracy  of  prediction,  the  re¬ 
sulting  model  would  require  even  more  com¬ 
puting  resources  and  make  achieving  results 
even  more  time  consuming. 

Parallel  computing  offers  one  option  to 
decrease  the  computation  time  and  achieve 
more  real-time  results.  Use  of  parallel  com¬ 
puters  has  already  proven  to  be  beneficial  in 
reducing  computation  time  in  many  other  ap¬ 
plied  areas. 

Two  common  measures  of  effectiveness,  ac¬ 
counting  for  both  the  hardware  and  the  al¬ 
gorithm  are  speedup  and  efficiency.  The 
speedup,  Sp,  of  an  algorithm  is  defined  as 


where  Ts  is  the  time  on  a  serial  computer  and 
Ti  is  the  time  on  a  parallel  computer  having 
i  processors.  The  efficiency,  Ep ,  is  defined  by 

Er  =  ^  (2) 

P 

and  it  accounts  for  the  relative  cost  of  achiev¬ 
ing  a  specific  speedup,  many  factors  could 
possibly  limit  the  efficiency  of  a  parallel  pro¬ 
gram.  These  factors  include  the  number  of 
sequential  operations  that  cannot  be  paral¬ 
lelized,  the  communication  time  between  pro¬ 
cessors,  and  the  time  each  processor  is  idle 


due  to  synchronization  requirements,  see  e.g. 
Quinn  [13]. 

Two  decomposition  strategies  can  be  used 
in  parallelization  of  any  algorithm,  i.e.  con¬ 
trol  decomposition  and  domain  or  data  de¬ 
composition.  It  was  shown  by  Phipps  et 
al  [11]  that  control  decomposition  is  ineffi¬ 
cient  for  orbit  computation  using  the  analytic 
methods  mentioned  above. 

In  this  paper,  we  will  summarize  the  results 
of  parallelization  of  the  analytic  orbit  prop¬ 
agators  using  domain  decomposition  strat¬ 
egy.  The  INTEL  iPSC/2  hypercube  is  used. 
We  will  also  discuss  the  use  of  a  cluster  of 
Unix-based  workstations  networked  and  all 
running  the  Parallel  Virtual  Machine  (PVM) 
software.  PVM  was  developed  by  Oak  Ridge 
National  Laboratory.  It  is  a  software  sys¬ 
tem  that  enables  a  collection  of  heterogeneous 
computers  to  be  used  as  a  coherent  and  flex¬ 
ible  concurrent  computational  system  (Geist 
et  al,  [4]).  In  the  next  section,  we  discuss  the 
results  of  parallelization  when  using  the  IN¬ 
TEL  hypercube.  We  give  a  brief  introduction 
to  PVM  software  in  section  3.  The  results  of 
parallelizing  PPT2  on  a  cluster  of  worksta¬ 
tions  will  be  detailed  in  section  4.  In  section 
5  we  discuss  PVM  use  in  parallelizing  the  Air 
Force  models.  We  give  our  conclusions  in  sec¬ 
tion  6. 

2  Parallel  Ver¬ 

sions  of  PPT2,  SGP, 
SGP4/SDP4 

In  this  section,  we  discuss  the  parallelization 
of  PPT2  as  well  as  SGP,  SGP4/SDP4.  The 
idea  id  to  let  one  processor  read  and  dis¬ 
tribute  the  data  to  the  other  (p  —  2)  pro¬ 
cessors  which  propagate  the  orbit  and  send 
their  results  to  another  processor,  the  collec¬ 
tor,  which  writes  to  the  disk,  see  figure  1. 
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The  results  for  n  satellites,  36  <  n  < 
10000,  are  given  in  Table  1  for  a  hypercube 
consisting  of  8  processors. 

It  is  clear  that  P3T  is  more  efficient.  This 
should  not  be  of  a  surprise,  since  PPT2  re¬ 
quires  more  computation  time  (11.2  msec) 
than  the  others.  Note  also  that  the  efficiency 
is  improving  with  the  number  of  processors. 

The  question  is  now  how  to  fold  the  opti¬ 
mal  number  of  processors  to  use.  Phipps  et 
al  [11,  12]  have  developed  a  model  for  the  ex¬ 
ecution  time  to  propagate  n  objects  using  p 
processor.  The  time  t(p)  is  given  by 

t(p)  =  twl(p)  +  tw2(p)  +  tc(p)  (3) 

where  tw\(p)  is  the  time  the  last  node  must 
wait  to  receive  its  first  data  set,  tw2(p )  is  the 
total  time  the  last  node  must  wait  for  all  its 
subsequent  data,  and  tc(p)  is  the  time  for  each 
node  to  propagate  its  share  of  the  n  objects. 

It  was  shown  there  that 

twi(p)  =  (p  ~  3)tm(l)  (4) 

f  0  if  twl  <  U 

W2^P  l  (^2  -  i)  (^i(p)  -  ^l)  if  twl  >  tl 

(5) 

/  \  Tit]  .  . 

^  6 
p  —  2 

where  tm(  1)  is  the  time  to  send  a  single  mes¬ 
sage  between  the  distributing  and  working 
node  and  t\  is  the  time  to  propagate  one  ob¬ 
ject.  These  were  found  to  be 

tm(  1)  =  .0374  msec  and  %  =  4.60  msec. 

Therefore,  the  speedup  and  efficiency  for  n  = 
5000  objects,  can  be  plotted  as  a  function  of 
the  number  of  processors.  It  can  be  seen  in 
the  next  figure  that  for  P3T  the  maximum 
efficiency  is  87%  and  is  achieved  when  us¬ 
ing  16  processors.  For  PSGP,  the  maximum 
efficiency  is  over  90%  using  128  processors. 


For  PSGP4  and  PSDP4,  the  maximum  effi¬ 
ciency  (over  90%)  can  be  achieved  when  using 
64  processors.  Figures  2-5  show  the  plots  of 
the  efficiency  of  each  code  as  a  function  of  p. 
Note  that  the  number  of  objects  propagated 
by  each  code  is  different.  When  using  SGP, 
one  handles  all  orbits  the  same,  but  when  us¬ 
ing  SDP4,  only  the  “deep  space”  orbits  are 
considered.  The  rest  are  handled  by  SGP4. 

As  a  result  of  discussion  with  AFSPACE- 
COM,  we  realized  that  the  propagator  in  usu¬ 
ally  called  several  times  for  each  object.  Each 
call  corresponds  to  a  specified  time  beyond 
epoch.  SGP4  propagates  data  for  low  earth 
objects  which  requires  more  frequent  tracking 
than  deep  space  satellites.  Thus,  a  relatively 
large  number  of  observations  are  received  per 
day  by  the  AFSPACECOM  for  each  low  earth 
satellite.  The  estimated  number  of  calls  to 
SGP4  for  each  object  is  75  and  to  SDP4  is 
25.  To  analyze  the  speedup  and  efficiency,  we 
note  that  each  time  a  new  set  of  satellite  data 
is  received  by  SGP4  an  initialization  subrou¬ 
tine  is  called  before  the  SGP4  main  subrou¬ 
tine  is  called.  For  every  other  incremented 
time  specified  for  the  same  satellite,  the  ini¬ 
tialization  program  is  not  called.  Thus  the 
execution  time  can  be  modeled  by  (Ostrom 
[10]) 

ti  =  t f  +  {m  -  1  )ts  (7) 

where  tj  is  the  time  to  propagate  the  satel¬ 
lite  including  initialization,  and  ts  is  the  prop¬ 
agation  time  without  initialization.  The  val¬ 
ues  of  tj,  ts  as  measured  on  the  iPSC/2  hy¬ 
percube  are 

tj  =  6.6msec,  ts  =  2.2msec, 

thus 

1 1  =  169.4msec. 

Figure  6  depicts  the  speedup  and  efficiency 
versus  hypercube  dimension  when  propagat¬ 
ing  5950  satellites  to  75  times  each.  Clearly 
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much  higher  speedups  are  obtainable  in  this 
case.  The  maximum  efficiency  is  nearly  100% 
when  using  a  hypercube  having  256  nodes. 

A  similar  analysis  for  SDP4  (Ostrom,  [10]) 
shows  that  t\  =  106.8  msec.  Using  now  1050 
satellites  (15%  of  a  total  of  7000  objects)  one 
folds  near  100%  efficiency  using  a  128-node 
hypercube,  see  Figure  7.  This  analysis  can 
be  extended  to  PPT2. 

3  Parallel  Virtual  Ma¬ 
chine 

Parallel  Virtual  Machine  (PVM)  is  a  small 
(~1  Mbytes  of  C  source  code)  software  pack¬ 
age  that  allows  a  heterogeneous  network  of 
Unix-based  computers  to  appear  as  a  single 
large  distributed-memory  parallel  computer. 
The  PVM  package  is  good  for  large-grain  par¬ 
allelism;  that  is,  as  least  100  kbytes/node. 
The  term  virtual  machine  is  used  to  desig¬ 
nate  a  logical  distributed-memory  computer 
and  host  is  used  to  designate  one  of  the  mem¬ 
ber  computers. 

The  PVM  software,  developed  at  Oak 
Ridge  National  Laboratory  (see  Dongara  et 
al  [3]  and  Sunderam  et  al  [15])  supplied 
the  functions  to  automatically  start  up  tasks 
to  communicate  and  synchronize  with  each 
other.  A  problem  can  be  solved  in  parallel  by 
sending  and  receiving  messages  to  accomplish 
multiple  tasks,  similar  to  send  and  receive  on 
the  hypercube. 

PVM  handles  all  message  conversion  that 
may  be  required  if  two  computers  use  differ¬ 
ent  data  representations.  PVM  also  ensures 
that  error  messages  generated  on  a  remote 
computer  are  displayed  on  the  user’s  local 
screen. 

The  PVM  system  is  actually  composed  of 
two  parts,  the  daemon  and  a  library  of  PVM 
interface  routines.  The  daemon  (pvmd  or 
pvmd3)  resides  on  all  the  computers  making 


up  the  virtual  machine.  When  a  user  desires 
to  run  a  PVM  application,  he/she  executes 
pvmd  on  one  of  the  computers  which  in  turn 
starts  up  pvmd  on  all  the  others.  The  library 
of  PVM  interface  contains  routines  for  mes¬ 
sage  passing,  spawning  processes,  coordinat¬ 
ing  tasks,  and  modifying  the  virtual  machine. 

4  Parallelization  of 

PPT2  using  PVM 

Stone  [14]  has  tried  four  possibilities  of  do¬ 
main  (data)  decomposition. 

•  The  master  sends  one  satellite  to  each 
working  node,  then  sends  one  satellite  at 
a  time  upon  request  (dsl). 

•  The  master  sends  one  satellite  to  each 
working  processor  then  continues  in 
round-robin  fashion  (ds2). 

•  The  entire  data  set  is  divided  to  p  (num¬ 
ber  of  working  nodes)  blocks.  The  mas¬ 
ter  sends  a  block  to  each  working  node 
(ds3). 

•  The  entire  data  set  is  divided  to  2 p 
blocks.  The  master  sends  one  block  to 
each  and  then  the  other  block  to  each 
(ds4). 

In  the  second  option  we  save  on  communica¬ 
tions.  In  the  third  case  we  save  even  more  on 
communication  because  we  reduced  the  num¬ 
ber  of  times  required  to  send  data.  On  the 
other  hand,  sending  such  large  blocks  forces 
the  others  to  wait.  Thus  the  last  case  is  an 
attempt  to  compromise  between  the  previous 
two. 

For  these  experiments,  PVM  was  started 
on  eighteen  different  workstations  so  mea¬ 
surements  could  be  taken  for  one  to  sixteen 
working  nodes.  The  workstations  are  SUN 
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Sparc  II  and  Sparc  IPX  having  40  MHz  pro¬ 
cessors  and  configured  with  32  Mbytes  of 
system  memory.  The  workstations  are  con¬ 
nected  by  a  10  Mbytes  Ethernet  based  net¬ 
work.  Stone  experimented  with  600  and  1200 
objects  in  the  data  set.  We  give  here  the 
result  for  1200  (figure  8).  It  is  clear  that 
four  working  processors  suffice  to  minimize 
the  computing  time  and  that  the  fourth  pos¬ 
sibility  is  the  best.  Stone  [14]  has  shown  that 
a  speedup  of  almost  6  was  achieved  when  us¬ 
ing  8  SUN  workstations. 

5  Parallelization  of  SGP4 
using  PVM 

Brewer  [1]  has  tried  three  possibilities  for  do¬ 
main  (data)  decomposition. 

•  Answer  Back  Method  (ABM) 

The  master  sends  one  block  of  to  satel¬ 
lites  to  each  working  node.  Upon  re¬ 
quest  a  working  processor  receives  an¬ 
other  block  of  to  satellites  until  the  data 
set  is  processed. 

•  Successive  Deal  I  (SDI) 

The  master  sends  one  block  of  to  satel¬ 
lites  to  each  working  node  and  continues 
to  deal  such  blocks  in  round-robin  fash¬ 
ion. 

•  Successive  Deal  II  (SDII) 

The  master  sends  one  block  of  to  satel¬ 
lites  to  each  working  node.  The  rest  of 
the  data  set  is  divided  by  2 p  (twice  the 
number  of  working  nodes).  Blocks  of  this 
size  are  given  to  each  working  nodes  in 
round-robin  fashion  (2  blocks  each). 

The  second  method  will  eliminate  the  com¬ 
munication  time  by  the  workers  requesting 
more  data.  The  third  method  will  cut  the 


communication  overhead.  This  is  different 
from  SDI  with  a  larger  to,  because  in  SDII 
large  blocks  are  sent  while  the  workers  are 
busy  propagating  the  first  to  satellites. 

We  have  experimented  with  various  val¬ 
ues  of  to  and  chosen  4,8  and  16  processors 
(i.e.  2,6,14  working  nodes,  respectively).  The 
number  of  satellites  taken  to  be  7000,  15% 
of  which  were  considered  deep-space.  For 
a  deep-space  satellite  25  calls  were  made  to 
SDP4.  For  the  other  satellites,  75  calls  were 
made  to  SGP4. 

The  first  measure  is  the  end-to-end  time. 
This  is  the  most  important,  since  it  is  a  re¬ 
flection  of  the  total  performance  of  each  al¬ 
gorithm.  The  Answer  Back  Method  was  su¬ 
perior  when  using  4  or  8  processors.  When 
using  16  processors,  ABM  was  faster  in  most 
cases.  See  Figures  9-11. 

We  can  look  at  this  from  another  point  of 
view.  In  the  next  three  figures,  we  plot  the 
end-to-end  time  for  each  method.  It  is  clear 
from  figure  12  that  a  choice  of  8  or  16  pro¬ 
cessors  is  the  best  (shortest  time)  for  ABM. 
For  SDI  and  SDII  a  choice  of  16  processors  is 
best. 

The  second  measure  is  the  percent  of  time 
a  working  processor  spent  on  communication. 
From  the  next  three  figures  13-15  is  clear  that 
SDII  requires  less  communication  time,  which 
shouldn’t  be  surprising.  It  is  also  clear  that 
the  more  working  nodes  we  have  the  higher 
the  percentage. 

The  third  measure  is  efficiency.  In  all  three 
cases,  the  ABM  was  more  efficient.  The  next 
three  figures  16-18  show  that  for  each  method 
it  is  more  efficient  to  use  4  or  8  processors 
rather  than  16. 

In  closing  we  should  note  that  with  the  use 
of  an  open  network,  there  are  great  fluctua¬ 
tions  in  the  amount  of  time  taken  to  perform 
a  given  task.  The  execution  time  depends  on 
the  number  of  current  users  and  the  percent¬ 
age  of  the  CPU  allocated  to  each  user.  To 
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partially  compensate  for  that,  we  averaged 
10  run  times  to  arrive  at  our  results. 

6  Conclusions 

In  this  paper  we  have  shown  the  benefit  of 
MIMD  parallel  computers  in  predicting  the 
orbit  of  objects.  Analytic  orbit  propagators 
currently  in  use  by  the  Navy  and  Air  Force 
were  implemented  on  an  INTEL  iPSC/2  hy¬ 
percube  and  on  a  cluster  of  networked  Unix- 
based  workstations  running  PVM.  The  effi¬ 
ciency  of  the  algorithms  nears  100%  when  us¬ 
ing  the  optimal  number  of  processors.  This 
optimal  number  depends  on  the  number  of 
satellites,  the  orbit  propagator  used  and  the 
number  of  calls  to  the  propagator  per  satel¬ 
lite.  For  a  cluster  of  workstations  we  have 
used  the  software  PVM  and  have  shown  that 
it  is  more  efficient  to  use  4  or  8  workstations 
than  16.  The  speedup  is  almost  6  when  using 
8  workstations. 
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